Relabeling Syntax Trees to Improve Syntax-Based Machine Translation Quality
نویسندگان
چکیده
We identify problems with the Penn Treebank that render it imperfect for syntaxbased machine translation and propose methods of relabeling the syntax trees to improve translation quality. We develop a system incorporating a handful of relabeling strategies that yields a statistically significant improvement of 2.3 BLEU points over a baseline syntax-based system.
منابع مشابه
Can Semantic Roles Improve Syntax-Based Machine Translation?
This paper compares the performance of a Tree-to-string (TTS) transducer with automatically generated/gold-standard parse trees and semantic roles. Experimental results show that improving the parsing quality can lead to significant improvement in MT performance and adding semantic roles in the syntax tree labels does not improve the TTS transducer. Another approach of using semantic roles: ske...
متن کاملBinarizing Syntax Trees to Improve Syntax-Based Machine Translation Accuracy
We show that phrase structures in Penn Treebank style parses are not optimal for syntaxbased machine translation. We exploit a series of binarization methods to restructure the Penn Treebank style trees such that syntactified phrases smaller than Penn Treebank constituents can be acquired and exploited in translation. We find that by employing the EM algorithm for determining the binarization o...
متن کاملLearning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
We propose a novel technique of learning how to transform the source parse trees to improve the translation qualities of syntax-based translation models using synchronous context-free grammars. We transform the source tree phrasal structure into a set of simpler structures, expose such decisions to the decoding process, and find the least expensive transformation operation to better model word ...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کاملStatistical Translation Model Based On Source Syntax Structure
Syntax-based statistical translation model is proved to be better than phrasebased model, especially for language pairs with very different syntax structures, such as Chinese and English. In this talk I will introduce a serial of statistical translation models based on source syntax structure. The tree-based model uses the one best syntax tree for translation. The forest-based model uses a comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006